Intelligent Coordinates and Collective Intelligence: A case study in improving Simulated Annealing in the bin-packing domain
نویسندگان
چکیده
Consider a set of non-cooperative agents acting in an environment in which each agent attempts to maximize a private utility function. As each agent maximizes its private utility we desire a global ”world” utility function to in turn be maximized. The inverse problem induced from this situation is the following: How does each agent choose his move so that while he optimizes his private utility, the world utility is optimized as well? This problem has been considered by the theory of COllective INtelligence (COIN) (Wolpert & Tumer 2000). This paper focuses on a method for improving a class of search algorithms using Intelligent RL based learning players engaging in a non-cooperative game. The players or ”coordinates” use a more traditional AI approach to learning in a system in which largely the state of the world is not known, and effects of each agent’s actions on the world are not known a priori, and must be discovered. This is essentially a search problem through the possible policies of each coordinate of the underlying system. Search algorithms such as Simulated Annealing, Swarm Intelligence, and Genetic Algorithms, that tradeoff between exploration and exploitation are particularly useful in solving systems of this nautre, because they search directly in the space of RL policies, and try to select optimal policies which pay over time through simulation of indirect learning via optimization of the system; however, these algorithms all have one major drawback: They do not rely on the previous experiences of players in the system to help determine which actions the players should take next. This major drawback leads Wolpert and Tumer to develop a new method for improving search algorithms, Intelligent Coordinates(Wolpert, Tumer, & Bandari ). Intelligent Coordinates use RL to remember and ”learn” how to better solve the system, by applying traditional RL techniques to the exploration stage of these search algorithms. The algorithm improves the bin-packing simulation by an order of magnitude, and increases linearly with the number of items added. We present a more detailed explanation of the Intelligent Coordinates algorithm, and discuss two implementations of the bin-packing simulation that we have constructed in Java, one that uses Simulated Annealing to solve, and one that uses Intelligent Coordinates. Further, we contribute an efficient method for processing time-weighted sums, which are needed in the simulation. We also formalize a method for calculating the time-weighted probabilities of picking a particular bin B in the simulation. These probabilities are used to ”mask” the original SA distribution to achieve the more relevant IC probability distribution. We also present some original experiments in the bin-packing simulation that were not present in the original paper(Wolpert, Tumer, & Bandari ). Introduction,Multiagent Bin-Packing
منابع مشابه
Applying multiagent reinforcement learning to distributed function optimization problems
Consider a set of non-cooperative agents acting in an environment in which each agent attempts to maximize a private utility function. As each agent maximizes its private utility we desire a global ”world” utility function to in turn be maximized. The inverse problem induced from this situation is the following: How does each agent choose his move so that while he optimizes his private utility,...
متن کاملImproving Search Algorithms by Using Intelligent Coordinates
We consider algorithms that maximize a global function G in a distributed manner, using a different adaptive computational agent to set each variable of the underlying space. Each agent eta is self-interested; it sets its variable to maximize its own function g(eta). Three factors govern such a distributed algorithm's performance, related to exploration/exploitation, game theory, and machine le...
متن کاملImproving Simulated Annealing by Recasting it as a Non-Cooperative Game
The game-theoretic field ofCOllectiveINtelligence (COIN) concernsthedesignofcomputerbased playersengaged ina non-cooperative game so thatasthoseplayerspursuetheirselfinterests, a pre-specified globalgoalforthecollective computationalsystem isachieved"asa side-eRect'. PreviousimplementationsofCOIN algorithmshave outperformedconventional techniquesby up toseveralordersofmagnitude,on domains rangi...
متن کاملComparison of Meta-Heuristic Algorithms for Clustering Rectangles
In this paper we consider a simplified version of the stock cutting (two-dimensional bin packing) problem. We compare three meta-heuristic algorithms (genetic algorithm (GA), tabu search (TS) and simulated annealing (SA)) when applied to this problem. The results show that tabu search and simulated annealing produce good quality results. This is not the case with the genetic algorithm. The prob...
متن کاملA SAIWD-Based Approach for Simultaneous Reconfiguration and Optimal Siting and Sizing of Wind Turbines and DVR units in Distribution Systems
In this paper, a combination of simulated annealing (SA) and intelligent water drops (IWD) algorithm is used to solve the nonlinear/complex problem of simultaneous reconfiguration with optimal allocation (size and location) of wind turbine (WT) as a distributed generation (DG) and dynamic voltage restorer (DVR) as a distributed flexible AC transmission systems (DFACT) unit in a distribution sys...
متن کامل